Overview

Dataset statistics

Number of variables14
Number of observations2227577
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory178.4 MiB
Average record size in memory84.0 B

Variable types

Numeric8
Categorical2
Boolean4

Alerts

Rating is highly correlated with Rating CountHigh correlation
Rating Count is highly correlated with Maximum InstallsHigh correlation
Installs is highly correlated with Maximum InstallsHigh correlation
Maximum Installs is highly correlated with Rating Count and 1 other fieldsHigh correlation
Free is highly correlated with PriceHigh correlation
Price is highly correlated with FreeHigh correlation
Category is highly correlated with Content Rating and 1 other fieldsHigh correlation
Content Rating is highly correlated with CategoryHigh correlation
Ad Supported is highly correlated with CategoryHigh correlation
Rating Count is highly skewed (γ1 = 261.321718) Skewed
Installs is highly skewed (γ1 = 185.3775748) Skewed
Maximum Installs is highly skewed (γ1 = 169.9392771) Skewed
Price is highly skewed (γ1 = 99.08154382) Skewed
df_index is uniformly distributed Uniform
df_index has unique values Unique
Rating has 1041090 (46.7%) zeros Zeros
Rating Count has 1041090 (46.7%) zeros Zeros
Price has 2185451 (98.1%) zeros Zeros

Reproduction

Analysis started2022-10-29 19:05:35.861483
Analysis finished2022-10-29 19:08:03.326488
Duration2 minutes and 27.47 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct2227577
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1156663.919
Minimum0
Maximum2312943
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size17.0 MiB
2022-10-29T15:08:03.400506image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile116074.8
Q1578364
median1156444
Q31734870
95-th percentile2197363.2
Maximum2312943
Range2312943
Interquartile range (IQR)1156506

Descriptive statistics

Standard deviation667627.6839
Coefficient of variation (CV)0.5772010978
Kurtosis-1.200549236
Mean1156663.919
Median Absolute Deviation (MAD)578252
Skewness9.175538382 × 10-5
Sum2.576557943 × 1012
Variance4.457267242 × 1011
MonotonicityStrictly increasing
2022-10-29T15:08:03.487525image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
15422871
 
< 0.1%
15423011
 
< 0.1%
15423001
 
< 0.1%
15422991
 
< 0.1%
15422981
 
< 0.1%
15422971
 
< 0.1%
15422961
 
< 0.1%
15422951
 
< 0.1%
15422941
 
< 0.1%
Other values (2227567)2227567
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
23129431
< 0.1%
23129421
< 0.1%
23129411
< 0.1%
23129401
< 0.1%
23129391
< 0.1%
23129381
< 0.1%
23129371
< 0.1%
23129361
< 0.1%
23129351
< 0.1%
23129341
< 0.1%

Category
Categorical

HIGH CORRELATION

Distinct48
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size17.0 MiB
Education
233858 
Music & Audio
152493 
Business
 
138291
Tools
 
137839
Entertainment
 
134355
Other values (43)
1430741 

Length

Max length23
Median length15
Mean length10.4002672
Min length4

Characters and Unicode

Total characters23167396
Distinct characters41
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAdventure
2nd rowTools
3rd rowProductivity
4th rowCommunication
5th rowTools

Common Values

ValueCountFrequency (%)
Education233858
 
10.5%
Music & Audio152493
 
6.8%
Business138291
 
6.2%
Tools137839
 
6.2%
Entertainment134355
 
6.0%
Lifestyle115415
 
5.2%
Books & Reference114621
 
5.1%
Personalization87506
 
3.9%
Health & Fitness80742
 
3.6%
Productivity75522
 
3.4%
Other values (38)956935
43.0%

Length

2022-10-29T15:08:03.576545image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
618978
 
17.7%
education233858
 
6.7%
music156515
 
4.5%
audio152493
 
4.4%
business138291
 
4.0%
tools137839
 
4.0%
entertainment134355
 
3.9%
lifestyle115415
 
3.3%
books114621
 
3.3%
reference114621
 
3.3%
Other values (52)1570980
45.0%

Most occurring characters

ValueCountFrequency (%)
i2023801
 
8.7%
e1901848
 
8.2%
o1879000
 
8.1%
n1731829
 
7.5%
t1497767
 
6.5%
s1488549
 
6.4%
a1442402
 
6.2%
1260389
 
5.4%
u1003856
 
4.3%
c954038
 
4.1%
Other values (31)7983917
34.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter18419041
79.5%
Uppercase Letter2868988
 
12.4%
Space Separator1260389
 
5.4%
Other Punctuation618978
 
2.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i2023801
11.0%
e1901848
10.3%
o1879000
10.2%
n1731829
9.4%
t1497767
8.1%
s1488549
8.1%
a1442402
 
7.8%
u1003856
 
5.5%
c954038
 
5.2%
r811596
 
4.4%
Other values (13)3684355
20.0%
Uppercase Letter
ValueCountFrequency (%)
E413878
14.4%
A285955
10.0%
B274425
9.6%
P271457
9.5%
M253462
8.8%
F215734
7.5%
T214105
7.5%
S191427
6.7%
L185231
6.5%
R133367
 
4.6%
Other values (6)429947
15.0%
Space Separator
ValueCountFrequency (%)
1260389
100.0%
Other Punctuation
ValueCountFrequency (%)
&618978
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin21288029
91.9%
Common1879367
 
8.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i2023801
 
9.5%
e1901848
 
8.9%
o1879000
 
8.8%
n1731829
 
8.1%
t1497767
 
7.0%
s1488549
 
7.0%
a1442402
 
6.8%
u1003856
 
4.7%
c954038
 
4.5%
r811596
 
3.8%
Other values (29)6553343
30.8%
Common
ValueCountFrequency (%)
1260389
67.1%
&618978
32.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII23167396
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i2023801
 
8.7%
e1901848
 
8.2%
o1879000
 
8.1%
n1731829
 
7.5%
t1497767
 
6.5%
s1488549
 
6.4%
a1442402
 
6.2%
1260389
 
5.4%
u1003856
 
4.3%
c954038
 
4.1%
Other values (31)7983917
34.5%

Rating
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct42
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.185477674
Minimum0
Maximum5
Zeros1041090
Zeros (%)46.7%
Negative0
Negative (%)0.0%
Memory size17.0 MiB
2022-10-29T15:08:03.652562image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median2.9
Q34.3
95-th percentile4.9
Maximum5
Range5
Interquartile range (IQR)4.3

Descriptive statistics

Standard deviation2.108288846
Coefficient of variation (CV)0.9646810266
Kurtosis-1.860825924
Mean2.185477674
Median Absolute Deviation (MAD)2.1
Skewness0.01530924985
Sum4868319.8
Variance4.444881857
MonotonicityNot monotonic
2022-10-29T15:08:03.730580image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=42)
ValueCountFrequency (%)
01041090
46.7%
597999
 
4.4%
4.284568
 
3.8%
4.483043
 
3.7%
4.379955
 
3.6%
4.675664
 
3.4%
4.573993
 
3.3%
4.166769
 
3.0%
464579
 
2.9%
4.760295
 
2.7%
Other values (32)499622
22.4%
ValueCountFrequency (%)
01041090
46.7%
1704
 
< 0.1%
1.1233
 
< 0.1%
1.2514
 
< 0.1%
1.3559
 
< 0.1%
1.4974
 
< 0.1%
1.51128
 
0.1%
1.61596
 
0.1%
1.71864
 
0.1%
1.82855
 
0.1%
ValueCountFrequency (%)
597999
4.4%
4.943474
2.0%
4.859538
2.7%
4.760295
2.7%
4.675664
3.4%
4.573993
3.3%
4.483043
3.7%
4.379955
3.6%
4.284568
3.8%
4.166769
3.0%

Rating Count
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct35394
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2005.649982
Minimum0
Maximum56025424
Zeros1041090
Zeros (%)46.7%
Negative0
Negative (%)0.0%
Memory size17.0 MiB
2022-10-29T15:08:03.809598image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median6
Q340
95-th percentile1244
Maximum56025424
Range56025424
Interquartile range (IQR)40

Descriptive statistics

Standard deviation88039.39824
Coefficient of variation (CV)43.89569418
Kurtosis112441.8315
Mean2005.649982
Median Absolute Deviation (MAD)6
Skewness261.321718
Sum4467739770
Variance7750935643
MonotonicityNot monotonic
2022-10-29T15:08:03.885615image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01041090
46.7%
562997
 
2.8%
653130
 
2.4%
745766
 
2.1%
839674
 
1.8%
935103
 
1.6%
1031548
 
1.4%
1128474
 
1.3%
1225483
 
1.1%
1323036
 
1.0%
Other values (35384)841276
37.8%
ValueCountFrequency (%)
01041090
46.7%
562997
 
2.8%
653130
 
2.4%
745766
 
2.1%
839674
 
1.8%
935103
 
1.6%
1031548
 
1.4%
1128474
 
1.3%
1225483
 
1.1%
1323036
 
1.0%
ValueCountFrequency (%)
560254241
< 0.1%
364463811
< 0.1%
310186231
< 0.1%
268608601
< 0.1%
263400561
< 0.1%
221480321
< 0.1%
217540251
< 0.1%
189567561
< 0.1%
180665591
< 0.1%
168356981
< 0.1%

Installs
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean108613.0744
Minimum0
Maximum1000000000
Zeros11173
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size17.0 MiB
2022-10-29T15:08:03.959632image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile5
Q150
median500
Q35000
95-th percentile100000
Maximum1000000000
Range1000000000
Interquartile range (IQR)4950

Descriptive statistics

Standard deviation3679195.179
Coefficient of variation (CV)33.87433049
Kurtosis44326.912
Mean108613.0744
Median Absolute Deviation (MAD)495
Skewness185.3775748
Sum2.419439865 × 1011
Variance1.353647717 × 1013
MonotonicityNot monotonic
2022-10-29T15:08:04.025647image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
100430857
19.3%
1000386187
17.3%
10289162
13.0%
10000246901
11.1%
500183387
8.2%
50165273
 
7.4%
5000138748
 
6.2%
100000103887
 
4.7%
5000071690
 
3.2%
570538
 
3.2%
Other values (10)140947
 
6.3%
ValueCountFrequency (%)
011173
 
0.5%
162381
 
2.8%
570538
 
3.2%
10289162
13.0%
50165273
 
7.4%
100430857
19.3%
500183387
8.2%
1000386187
17.3%
5000138748
 
6.2%
10000246901
11.1%
ValueCountFrequency (%)
100000000016
 
< 0.1%
50000000033
 
< 0.1%
100000000367
 
< 0.1%
50000000622
 
< 0.1%
100000005229
 
0.2%
50000005761
 
0.3%
100000030396
 
1.4%
50000024969
 
1.1%
100000103887
4.7%
5000071690
3.2%

Maximum Installs
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct238730
Distinct (%)10.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean192136.9274
Minimum0
Maximum2123105347
Zeros11173
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size17.0 MiB
2022-10-29T15:08:04.102664image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7
Q184
median683
Q37061
95-th percentile216818.2
Maximum2123105347
Range2123105347
Interquartile range (IQR)6977

Descriptive statistics

Standard deviation5910508.481
Coefficient of variation (CV)30.76196002
Kurtosis40807.05337
Mean192136.9274
Median Absolute Deviation (MAD)671
Skewness169.9392771
Sum4.279998004 × 1011
Variance3.493411051 × 1013
MonotonicityNot monotonic
2022-10-29T15:08:04.181685image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
316028
 
0.7%
215881
 
0.7%
415567
 
0.7%
615134
 
0.7%
515006
 
0.7%
114905
 
0.7%
714258
 
0.6%
813527
 
0.6%
1013023
 
0.6%
912613
 
0.6%
Other values (238720)2081635
93.4%
ValueCountFrequency (%)
011173
0.5%
114905
0.7%
215881
0.7%
316028
0.7%
415567
0.7%
515006
0.7%
615134
0.7%
714258
0.6%
813527
0.6%
912613
0.6%
ValueCountFrequency (%)
21231053471
< 0.1%
17935022181
< 0.1%
17044959941
< 0.1%
16827630211
< 0.1%
16660166121
< 0.1%
16458115821
< 0.1%
16212654911
< 0.1%
16161413941
< 0.1%
14942523501
< 0.1%
14465354691
< 0.1%

Free
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.1 MiB
True
2185451 
False
 
42126
ValueCountFrequency (%)
True2185451
98.1%
False42126
 
1.9%
2022-10-29T15:08:04.267701image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Price
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct1033
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1020611807
Minimum0
Maximum400
Zeros2185451
Zeros (%)98.1%
Negative0
Negative (%)0.0%
Memory size17.0 MiB
2022-10-29T15:08:04.335719image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum400
Range400
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.654431754
Coefficient of variation (CV)26.00824072
Kurtosis12503.47129
Mean0.1020611807
Median Absolute Deviation (MAD)0
Skewness99.08154382
Sum227349.1386
Variance7.046007939
MonotonicityNot monotonic
2022-10-29T15:08:04.417735image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02185451
98.1%
0.9911172
 
0.5%
1.995412
 
0.2%
2.993621
 
0.2%
1.493570
 
0.2%
4.992327
 
0.1%
3.992260
 
0.1%
2.492085
 
0.1%
3.491205
 
0.1%
9.99809
 
< 0.1%
Other values (1023)9665
 
0.4%
ValueCountFrequency (%)
02185451
98.1%
0.1948242
 
< 0.1%
0.2047351
 
< 0.1%
0.2078898
 
< 0.1%
0.211221
 
< 0.1%
0.2633261
 
< 0.1%
0.2735421
 
< 0.1%
0.3935851
 
< 0.1%
0.4157792
 
< 0.1%
0.4490111
 
< 0.1%
ValueCountFrequency (%)
4001
 
< 0.1%
399.9923
< 0.1%
394.992
 
< 0.1%
389.993
 
< 0.1%
384.991
 
< 0.1%
379.995
 
< 0.1%
374.991
 
< 0.1%
369.991
 
< 0.1%
365.991
 
< 0.1%
364.991
 
< 0.1%

Size
Real number (ℝ≥0)

Distinct1647
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19.17557466
Minimum0.0032
Maximum1500
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size17.0 MiB
2022-10-29T15:08:04.696799image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0.0032
5-th percentile1.9
Q14.9
median10
Q325
95-th percentile65
Maximum1500
Range1499.9968
Interquartile range (IQR)20.1

Descriptive statistics

Standard deviation23.97999704
Coefficient of variation (CV)1.250549069
Kurtosis93.7727556
Mean19.17557466
Median Absolute Deviation (MAD)6.6
Skewness4.890497946
Sum42715069.07
Variance575.0402578
MonotonicityNot monotonic
2022-10-29T15:08:04.776817image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1161873
 
2.8%
1255850
 
2.5%
1347819
 
2.1%
1445067
 
2.0%
1642251
 
1.9%
1541113
 
1.8%
1037484
 
1.7%
1737074
 
1.7%
1831534
 
1.4%
1929560
 
1.3%
Other values (1637)1797952
80.7%
ValueCountFrequency (%)
0.00321
 
< 0.1%
0.00331
 
< 0.1%
0.00341
 
< 0.1%
0.00461
 
< 0.1%
0.00473
< 0.1%
0.00511
 
< 0.1%
0.00531
 
< 0.1%
0.00581
 
< 0.1%
0.00612
< 0.1%
0.00621
 
< 0.1%
ValueCountFrequency (%)
15002
 
< 0.1%
11008
< 0.1%
10201
 
< 0.1%
10061
 
< 0.1%
10003
 
< 0.1%
9961
 
< 0.1%
9851
 
< 0.1%
9811
 
< 0.1%
9771
 
< 0.1%
9631
 
< 0.1%

Minimum Android
Real number (ℝ≥0)

Distinct23
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.324223809
Minimum0
Maximum8
Zeros12324
Zeros (%)0.6%
Negative0
Negative (%)0.0%
Memory size17.0 MiB
2022-10-29T15:08:04.854834image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2.3
Q14.1
median4.2
Q35
95-th percentile6
Maximum8
Range8
Interquartile range (IQR)0.9

Descriptive statistics

Standard deviation0.92216513
Coefficient of variation (CV)0.2132556433
Kurtosis4.876242744
Mean4.324223809
Median Absolute Deviation (MAD)0.2
Skewness-0.2864179094
Sum9632541.5
Variance0.8503885271
MonotonicityNot monotonic
2022-10-29T15:08:04.917849image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
4.1594105
26.7%
5389792
17.5%
4.4384164
17.2%
4330833
14.9%
4.2115123
 
5.2%
688965
 
4.0%
2.386088
 
3.9%
5.158698
 
2.6%
4.340532
 
1.8%
733930
 
1.5%
Other values (13)105347
 
4.7%
ValueCountFrequency (%)
012324
 
0.6%
1308
 
< 0.1%
1.1165
 
< 0.1%
1.52095
 
0.1%
1.68523
 
0.4%
23207
 
0.1%
2.116681
 
0.7%
2.223648
 
1.1%
2.386088
3.9%
316997
 
0.8%
ValueCountFrequency (%)
813705
 
0.6%
7.13036
 
0.1%
733930
 
1.5%
688965
 
4.0%
5.158698
 
2.6%
5389792
17.5%
4.4384164
17.2%
4.340532
 
1.8%
4.2115123
 
5.2%
4.1594105
26.7%

Content Rating
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size17.0 MiB
Everyone
1949928 
Teen
 
187771
Mature 17+
 
58137
Everyone 10+
 
31460
Unrated
 
151

Length

Max length15
Median length8
Mean length7.7718548
Min length4

Characters and Unicode

Total characters17312405
Distinct characters23
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEveryone
2nd rowEveryone
3rd rowEveryone
4th rowEveryone
5th rowEveryone

Common Values

ValueCountFrequency (%)
Everyone1949928
87.5%
Teen187771
 
8.4%
Mature 17+58137
 
2.6%
Everyone 10+31460
 
1.4%
Unrated151
 
< 0.1%
Adults only 18+130
 
< 0.1%

Length

2022-10-29T15:08:04.989865image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-29T15:08:05.068883image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
everyone1981388
85.5%
teen187771
 
8.1%
mature58137
 
2.5%
1758137
 
2.5%
1031460
 
1.4%
unrated151
 
< 0.1%
adults130
 
< 0.1%
only130
 
< 0.1%
18130
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e4396606
25.4%
n2169440
12.5%
r2039676
11.8%
y1981518
11.4%
o1981518
11.4%
E1981388
11.4%
v1981388
11.4%
T187771
 
1.1%
89857
 
0.5%
189727
 
0.5%
Other values (13)413516
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter14725790
85.1%
Uppercase Letter2227577
 
12.9%
Decimal Number179454
 
1.0%
Space Separator89857
 
0.5%
Math Symbol89727
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e4396606
29.9%
n2169440
14.7%
r2039676
13.9%
y1981518
13.5%
o1981518
13.5%
v1981388
13.5%
t58418
 
0.4%
a58288
 
0.4%
u58267
 
0.4%
d281
 
< 0.1%
Other values (2)390
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
E1981388
88.9%
T187771
 
8.4%
M58137
 
2.6%
U151
 
< 0.1%
A130
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
189727
50.0%
758137
32.4%
031460
 
17.5%
8130
 
0.1%
Space Separator
ValueCountFrequency (%)
89857
100.0%
Math Symbol
ValueCountFrequency (%)
+89727
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin16953367
97.9%
Common359038
 
2.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e4396606
25.9%
n2169440
12.8%
r2039676
12.0%
y1981518
11.7%
o1981518
11.7%
E1981388
11.7%
v1981388
11.7%
T187771
 
1.1%
t58418
 
0.3%
a58288
 
0.3%
Other values (7)117356
 
0.7%
Common
ValueCountFrequency (%)
89857
25.0%
189727
25.0%
+89727
25.0%
758137
16.2%
031460
 
8.8%
8130
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII17312405
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e4396606
25.4%
n2169440
12.5%
r2039676
11.8%
y1981518
11.4%
o1981518
11.4%
E1981388
11.4%
v1981388
11.4%
T187771
 
1.1%
89857
 
0.5%
189727
 
0.5%
Other values (13)413516
 
2.4%

Ad Supported
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.1 MiB
True
1115394 
False
1112183 
ValueCountFrequency (%)
True1115394
50.1%
False1112183
49.9%
2022-10-29T15:08:05.145900image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.1 MiB
False
2046587 
True
 
180990
ValueCountFrequency (%)
False2046587
91.9%
True180990
 
8.1%
2022-10-29T15:08:05.210915image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.1 MiB
False
2226878 
True
 
699
ValueCountFrequency (%)
False2226878
> 99.9%
True699
 
< 0.1%
2022-10-29T15:08:05.273929image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Interactions

2022-10-29T15:07:53.475999image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:25.023545image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:29.041454image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:33.208398image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:37.210308image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:41.394256image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:45.374158image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:49.396069image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:53.982108image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:25.528658image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:29.568573image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:33.725516image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:37.711423image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:41.878366image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:45.863269image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:49.902184image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:54.675265image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:26.019770image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:30.074688image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:34.223628image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:38.210536image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:42.387481image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:46.384386image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:50.425302image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:55.197383image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:26.504880image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:30.600808image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:34.706737image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:38.862683image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:42.877593image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:46.865495image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:50.907411image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:55.717500image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:27.001992image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:31.129927image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:35.204850image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:39.380800image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:43.382707image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:47.385613image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:51.427529image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:56.246620image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:27.511108image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:31.649045image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:35.695961image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:39.864910image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:43.858815image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:47.855720image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:51.916640image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:56.765738image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:28.000218image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:32.172163image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:36.208077image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:40.387028image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:44.374932image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:48.377838image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:52.434757image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:57.282855image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:28.520336image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:32.695282image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:36.705189image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:40.874139image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:44.858041image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:48.868949image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-29T15:07:52.934870image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-10-29T15:08:05.333943image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2022-10-29T15:08:05.453133image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-29T15:08:05.569226image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-29T15:08:05.683252image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-29T15:08:05.793277image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-10-29T15:08:05.882297image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-29T15:07:57.711549image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-29T15:07:59.304375image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexCategoryRatingRating CountInstallsMaximum InstallsFreePriceSizeMinimum AndroidContent RatingAd SupportedIn App PurchasesEditors Choice
00Adventure0.00.01015True0.010.07.1EveryoneFalseFalseFalse
11Tools4.464.050007662True0.02.95.0EveryoneTrueFalseFalse
22Productivity0.00.05058True0.03.74.0EveryoneFalseFalseFalse
33Communication5.05.01019True0.01.84.0EveryoneTrueFalseFalse
44Tools0.00.0100478True0.06.24.1EveryoneFalseFalseFalse
55Social0.00.05089True0.046.06.0TeenFalseTrueFalse
66Libraries & Demo4.512.010002567True0.02.54.1EveryoneTrueFalseFalse
77Lifestyle2.039.0500702True0.016.05.0EveryoneFalseFalseFalse
88Communication0.00.01018True0.01.34.4TeenFalseFalseFalse
99Personalization4.7820.05000062433True0.03.54.1EveryoneTrueFalseFalse

Last rows

df_indexCategoryRatingRating CountInstallsMaximum InstallsFreePriceSizeMinimum AndroidContent RatingAd SupportedIn App PurchasesEditors Choice
22275672312934Education0.00.056True0.03.64.0EveryoneTrueFalseFalse
22275682312935Personalization0.00.010001302True0.029.04.1EveryoneTrueFalseFalse
22275692312936Business0.00.0100353True0.021.05.0EveryoneFalseFalseFalse
22275702312937Education0.00.057True0.06.64.4EveryoneFalseFalseFalse
22275712312938Education3.417.010001980True0.010.04.1EveryoneTrueFalseFalse
22275722312939Role Playing4.316775.0100000337109True0.077.04.1TeenFalseFalseFalse
22275732312940Education0.00.0100430True0.044.04.1EveryoneFalseFalseFalse
22275742312941Education0.00.0100202True0.029.05.0EveryoneFalseFalseFalse
22275752312942Music & Audio3.58.010002635True0.010.05.0EveryoneTrueFalseFalse
22275762312943Trivia5.012.0100354True0.05.25.0EveryoneTrueFalseFalse